[1]:
# PyData libraries
import numpy as np
rng = np.random.default_rng(seed=42)
from numba import njit
from scipy.spatial.distance import squareform
from scipy.io import loadmat
from sklearn.manifold import Isomap
from hdbscan import HDBSCAN

# Persistence tools
from gtda.homology import VietorisRipsPersistence
from gtda.plotting import plot_point_cloud
from steenroder import barcodes, rips_barcodes

# Plotting
import matplotlib
import matplotlib.pyplot as plt
from matplotlib.collections import LineCollection
matplotlib.rcParams['text.usetex'] = True
matplotlib.rcParams['font.family'] = "serif"
matplotlib.rcParams['font.style'] = "normal"
matplotlib.rcParams['font.variant'] = "normal"
matplotlib.rcParams['font.serif'] = "Computer Modern Roman"

This dataset of 6040 samples from the configuration space of the cyclo-octane molecule \(\text{C}_{8} \text{H}_{16}\) is described in https://www.frontiersin.org/articles/10.3389/frai.2021.668302/full#B48 as follows:

What do we mean by “global shape”? Consider, for example, conformations of the cyclo-octane molecule \(C_8 H_{16}\), which consists of a ring of eight carbons atoms, each bonded to a pair of hydrogen atoms (see Figure 4, left). The locations of the carbon atoms in a conformation approximately determine the locations of the hydrogen atoms via energy minimization, and hence each molecule conformation can be mapped to a point in \(\mathbb{R}^{24} = \mathbb{R}^{8 \cdot 3}\), as the location of each carbon atom can be specified by three coordinates. This map realizes the conformation space of cyclo-octane as a subset of \(\mathbb{R}^{24}\), and then we mod out by rigid rotations and translations. Topologically, the conformation space of cyclo-octane turns out to be the union of a sphere with a Klein bottle, glued together along two circles of singularities (see Figure 4, right). This model was obtained by Martin et al. (2010), Martin and Watson (2011), and Brown et al. (2008), who furthermore obtain a triangulation of this dataset (a representation of the dataset as a union of vertices, edges, and triangles).

[2]:
cyclo_octane = loadmat("../data/pointsCycloOctane.mat")['pointsCycloOctane']
cyclo_octane.shape
[2]:
(6040, 24)

The circles of singularities can be found by several methods. Traditionally, local PCA was used. Here, we use a singular set found by using methods rooted in local persistent cohomology and used in https://www.pnas.org/content/117/33/19664:

[3]:
singular_indices = loadmat("../data/singularity_indicesCycloOctane_PH0.5.mat")['singularity_indices_PH'].flatten() - 1
nonsingular_indices = np.array([x for x in range(len(cyclo_octane)) if x not in singular_indices])
print(f"{len(singular_indices)} points in the singular set")
627 points in the singular set

Let us store the non-singular portion for later use

[4]:
nonsingular = cyclo_octane[nonsingular_indices]

A dimensionality reduction algorithm such as Isomap can help us visualize the full dataset in 3D:

[5]:
isomap = Isomap(n_components=3).fit_transform(cyclo_octane)
[6]:
plot_point_cloud(isomap, plotly_params={"trace": {"marker_size": 1},
                                        "layout": {"height": 600, "width": 600}})

We can also visualize the singular and non-singular portions:

[7]:
isomap_nonsingular = isomap[nonsingular_indices]
isomap_singular = isomap[singular_indices]
[8]:
plot_point_cloud(isomap_nonsingular, plotly_params={"trace": {"marker_size": 1},
                                                    "layout": {"height": 600, "width": 600}})

We can see that the singular set does look like two circles in the Isomap projection:

[9]:
plot_point_cloud(isomap_singular, plotly_params={"trace": {"marker_size": 1},
                                                 "layout": {"height": 600, "width": 600}})

From previous literature, we expect the nonsingular part of the data to consist of the union of a 2-sphere and a Klein bottle. Hence, we partition the non-singular part using the clustering algorithm HDBSCAN:

[10]:
HD = HDBSCAN(min_samples=2, min_cluster_size=300, alpha=1., cluster_selection_epsilon=0)
HD.fit(nonsingular)
[10]:
HDBSCAN(cluster_selection_epsilon=0, min_cluster_size=300, min_samples=2)

With these hyperparameters, the portion we are interested in turns out to be the cluster labelled as 3 by HDBSCAN:

[11]:
mask_klein = HD.labels_ == 3
isomap_klein = isomap_nonsingular[mask_klein]

plot_point_cloud(isomap_klein, plotly_params={"trace": {"marker_size": 1},
                                              "layout": {"height": 600, "width": 600}})

The remaining three clusters (that is, excluding the cluster of noisy points labelled by HDBSCAN as -1) should presumably make up a 2-sphere. Indeed, that would seem clear in the Isomap embedding:

[12]:
mask_sphere = np.isin(HD.labels_, [0, 1, 2])
isomap_sphere = isomap_nonsingular[mask_sphere]

plot_point_cloud(isomap_sphere, plotly_params={"trace": {"marker_size": 1},
                                               "layout": {"height": 600, "width": 600}})

We can obtain more evidence that these candidate components are indeed a Klein bottle and a 2-sphere by computing persistent homology of the corresponding 24-dimensional point clouds.

[13]:
klein = nonsingular[mask_klein]
sphere = nonsingular[mask_sphere]

Preliminary checks – regular persistence barcodes

Computing the persistent homology barcode (without a threshold) up to and including homology degree 2 is quite computationally challenging on the full components. We can use a threshold or subsample the point clouds. In the first case, a threshold of 1.5 suffices to demonstrate that the putative Klein bottle component has the correct mod 2 persistent homology, with two long bars in \(H_1\) and one long bar in \(H_2\).

WARNING: VERY LONG COMPUTATION

[14]:
max_edge_length = 1.5

VR = VietorisRipsPersistence(homology_dimensions=(0, 1, 2),
                             max_edge_length=max_edge_length,
                             collapse_edges=True,
                             infinity_values=np.inf)
[ ]:
dgm_klein = VR.fit_transform_plot(
    [klein],
    plotly_params={"traces": {"marker_size": 3},
                   "layout": {"title": f"Klein bottle component, thresholding at {max_edge_length}"}}
    )[0]
[ ]:
three_longest_bars = np.argsort(dgm_klein[:, 1] - dgm_klein[:, 0])[-3:]
two_longest_H1 = []
for bar in dgm_klein[three_longest_bars]:
    if bar[2] == 1:
        two_longest_H1.append(bar[:2])
    elif bar[2] == 2:
        longest_H2 = bar[:2]

print(f"""The two longest H_1 bars in the Klein bottle component are:
  {two_longest_H1[0]}
  {two_longest_H1[1]}

The longest H_2 bar is:
  {longest_H2}
""")

With the 2-sphere, we can use a larger threshold and see that there is a long bar in \(H_2\).

WARNING: VERY LONG COMPUTATION

[ ]:
max_edge_length = 2
VR.set_params(max_edge_length=max_edge_length);
[ ]:
dgm_sphere = VR.fit_transform_plot(
    [sphere],
    plotly_params={"traces": {"marker_size": 3},
                   "layout": {"title": f"Sphere component, thresholding at {max_edge_length}"}}
    )[0]

Stronger evidence when subsampling

By subsampling the point clouds, we can garner evidence that nothing else of note would appear in either component if we removed the threshold completely.

WARNING: LONG COMPUTATIONS

[ ]:
n_samples = 800

random_idxs_klein = rng.choice(np.arange(len(klein)), n_samples, replace=False)
random_idxs_sphere = rng.choice(np.arange(len(sphere)), n_samples, replace=False)
klein_subsampled = klein[random_idxs_klein]
sphere_subsampled = sphere[random_idxs_sphere]
[ ]:
max_edge_length = np.inf
VR.set_params(max_edge_length=max_edge_length);
[ ]:
dgm_klein_subsampled = VR.fit_transform_plot(
    [klein_subsampled],
    plotly_params={"traces": {"marker_size": 3},
                   "layout": {"title": f"Klein component, {n_samples}-point subsample, no threshold"}}
    )[0]
[ ]:
dgm_sphere_subsampled = VR.fit_transform_plot(
    [sphere_subsampled],
    plotly_params={"traces": {"marker_size": 3},
                   "layout": {"title": f"Sphere component, {n_samples}-point subsample, no threshold"}}
    )[0]

Steenrod barcodes

We use the rips_barcodes utility function for computing ordinary and Steenrod barcodes of (thresholded) Rips filtrations, starting from a point cloud in this case.

Relative cohomology, full Klein bottle component, threshold at \(R = 1.2\)

We now construct a Rips filtration of the alleged “Klein bottle component”. We set a threshold at 1.2, at which we know that the two \(H_1\) bars and the \(H_1\) bar are all alive. We compute relative (ordinary and Steenrod) barcodes.

[ ]:
X = klein
max_edge_length = 1.2
barcodes_kwargs = {
    "distance_matrix": False,
    "max_edge_length": max_edge_length,
    "k": 1,  # Compute Sq^1
    "max_simplex_dimension": 3,
    "absolute": False
}

barcode, steenrod_barcode = rips_barcodes(X, **barcodes_kwargs, verbose=True)
[ ]:
n_dims = len(barcode)

lifetime_thresh = 0.2
eps = 0.01
min_filtration_value = 0

fig, (ax_rel_coho, ax_st) = plt.subplots(2, 1,
                                         figsize=(16, 8),
                                         sharex='col',
                                         gridspec_kw={'height_ratios': [2, 1]},
                                         tight_layout=True)

colors = ["Orange", "Green", "Blue", "Red"]
labels_rel_coho = [r"$\mathcal{H}^0_R$",
                   r"$\mathcal{H}^1_R$",
                   r"$\mathcal{H}^2_R$",
                   r"$\mathcal{H}^3_R$"]
labels_st = [r"$\mathrm{img}(Sq^1) \cap \mathcal{H}^0_R$",
             r"$\mathrm{img}(Sq^1) \cap \mathcal{H}^1_R$",
             r"$\mathrm{img}(Sq^1) \cap \mathcal{H}^2_R$",
             r"$\mathrm{img}(Sq^1) \cap \mathcal{H}^3_R$"]

counter = 0
for dim in range(1, n_dims):
    segs = []
    multiplicities = {}
    dgm = barcode[dim]
    dgm = dgm[dgm[:, 1] - dgm[:, 0] > lifetime_thresh]
    for p in dgm:
        if tuple(p) in multiplicities:
            multiplicities[tuple(p)] += 1
        else:
             multiplicities[tuple(p)] = 1

    counter_now = counter
    for i, (k, v) in enumerate(multiplicities.items()):
        death, birth = k
        y = - (counter_now + i)
        if death == -np.inf:
            ax_rel_coho.arrow(min_filtration_value - eps, y, -0.0000001, 0,
                              head_starts_at_zero=False, width=0, head_width=0.2, head_length=0.015,
                              color=colors[dim], ec=colors[dim])
            death = min_filtration_value - eps
        segs.append([[birth, y], [death, y]])
        if v > 1:
            ax_rel_coho.annotate(f"{v}", (death, y + 0.2))
        counter += 1

    segs = np.array(segs, dtype=np.float64)
    if len(segs):
        line_segments = LineCollection(segs, linewidths=2,
                                       colors=colors[dim],
                                       label=labels_rel_coho[dim],
                                       linestyle="solid")
        ax_rel_coho.add_collection(line_segments)

    counter += 1

ax_rel_coho.axvline(x=max_edge_length, color="gray", alpha=0.3)
ax_rel_coho.text(max_edge_length, y, rf"thresh = {max_edge_length}", rotation=90, fontdict={"fontsize": 15})

ax_rel_coho.autoscale()
ax_rel_coho.get_yaxis().set_visible(False)
ax_rel_coho.legend(loc="upper right", fontsize=18)
ax_rel_coho.margins(y=1/counter)
ax_rel_coho.set_title("Persistent relative cohomology barcode", fontdict={"fontsize": 22}, pad=15)

counter = 0
for dim in range(n_dims):
    segs = []
    multiplicities = {}
    dgm = steenrod_barcode[dim]
    for p in dgm:
        if tuple(p) in multiplicities:
            multiplicities[tuple(p)] += 1
        else:
             multiplicities[tuple(p)] = 1

    counter_now = counter
    for i, (k, v) in enumerate(multiplicities.items()):
        death, birth = k
        y = - (counter_now + i)
        if death == -np.inf:
            ax_st.arrow(min_filtration_value - eps, y, -0.0000001, 0,
                        head_starts_at_zero=False, width=0, head_width=0.1, head_length=0.015,
                        color=colors[dim], ec=colors[dim])
            death = min_filtration_value - eps
        segs.append([[birth, y], [death, y]])
        if v > 1:
            ax_st.annotate(f"{v}", (death, y + 0.2))
        counter += 1

    segs = np.array(segs, dtype=np.float64)
    if len(segs):
        line_segments = LineCollection(segs, linewidths=2,
                                       colors=colors[dim],
                                       label=labels_st[dim],
                                       linestyle="dashed")
        ax_st.add_collection(line_segments)

    counter += 1

ax_st.axvline(x=max_edge_length, color="gray", alpha=0.3)
ax_st.text(max_edge_length, y - 0.65, rf"thresh = {max_edge_length}", rotation=90, fontdict={"fontsize": 15})

ax_st.tick_params(axis="x", labelsize=18)

ax_st.autoscale()
ax_st.get_yaxis().set_visible(False)
ax_st.legend(loc="upper right", fontsize=18)
ax_st.margins(y=1)
ax_st.set_title("Steenrod barcode", fontdict={"fontsize": 22}, pad=15)

# plt.savefig("cyclo-octane_thresh_relative_barcodes.pdf")
plt.show()

Absolute cohomology, subsampled Klein bottle component, threshold at \(R = 1.5\)

We now construct a Rips filtration of our subsample of the alleged Klein bottle component. For computational reasons, we set a threshold again at 1.5, which we know to be larger than the death value of the longest \(H_2\) bar. We compute absolute (ordinary and Steenrod) barcodes.

[ ]:
X = klein_subsampled
max_edge_length = 1.5
barcodes_kwargs["absolute"] = True
barcodes_kwargs["max_edge_length"] = max_edge_length

barcode, steenrod_barcode = rips_barcodes(X, **barcodes_kwargs, verbose=True)
[ ]:
n_dims = len(barcode)

lifetime_thresh = 0.
eps = 0.01
max_filtration_value = max_edge_length

fig, (ax_rel_coho, ax_st) = plt.subplots(2, 1,
                                         figsize=(16, 8),
                                         sharex='col',
                                         gridspec_kw={'height_ratios': [2, 1]},
                                         tight_layout=True)

colors = ["Orange", "Green", "Blue", "Red"]
labels_rel_coho = [r"$\mathcal{H}^0_A$",
                   r"$\mathcal{H}^1_A$",
                   r"$\mathcal{H}^2_A$",
                   r"$\mathcal{H}^3_A$"]
labels_st = [r"$\mathrm{img}(Sq^1) \cap \mathcal{H}^0_A$",
             r"$\mathrm{img}(Sq^1) \cap \mathcal{H}^1_A$",
             r"$\mathrm{img}(Sq^1) \cap \mathcal{H}^2_A$",
             r"$\mathrm{img}(Sq^1) \cap \mathcal{H}^3_A$"]

counter = 0
for dim in range(1, n_dims - 1):
    segs = []
    multiplicities = {}
    dgm = barcode[dim]
    dgm = dgm[dgm[:, 1] - dgm[:, 0] > lifetime_thresh]
    for p in dgm:
        if tuple(p) in multiplicities:
            multiplicities[tuple(p)] += 1
        else:
             multiplicities[tuple(p)] = 1

    counter_now = counter
    for i, (k, v) in enumerate(multiplicities.items()):
        birth, death = k
        y = - (counter_now + i)
        if death == np.inf:
            ax_rel_coho.arrow(max_filtration_value + eps, y, 0.0000001, 0,
                              head_starts_at_zero=False, width=0, head_width=50, head_length=0.01,
                              color=colors[dim], ec=colors[dim])
            death = max_filtration_value + eps
        segs.append([[birth, y], [death, y]])
        if v > 1:
            ax_rel_coho.annotate(f"{v}", (death, y + 0.2))
        counter += 1

    segs = np.array(segs, dtype=np.float64)
    if len(segs):
        line_segments = LineCollection(segs, linewidths=2,
                                       colors=colors[dim],
                                       label=labels_rel_coho[dim],
                                       linestyle="solid")
        ax_rel_coho.add_collection(line_segments)

    counter += 1

ax_rel_coho.axvline(x=max_edge_length, color="gray", alpha=0.3)
ax_rel_coho.text(max_edge_length, y + 4, rf"thresh = {max_edge_length}", rotation=90, fontdict={"fontsize": 15})

ax_rel_coho.autoscale()
ax_rel_coho.get_yaxis().set_visible(False)
ax_rel_coho.legend(loc="lower left", fontsize=18)
ax_rel_coho.margins(y=1/counter)
ax_rel_coho.set_title("Persistent absolute cohomology barcode", fontdict={"fontsize": 22}, pad=15)

counter = 0
for dim in range(n_dims):
    segs = []
    multiplicities = {}
    dgm = steenrod_barcode[dim]
    dgm = dgm[dgm[:, 1] - dgm[:, 0] > lifetime_thresh]
    for p in dgm:
        if tuple(p) in multiplicities:
            multiplicities[tuple(p)] += 1
        else:
             multiplicities[tuple(p)] = 1

    counter_now = counter
    for i, (k, v) in enumerate(multiplicities.items()):
        birth, death = k
        y = - (counter_now + i)
        if death == np.inf:
            ax_st.arrow(max_filtration_value + eps, y, 0.0000001, 0,
                        head_starts_at_zero=False, width=0, head_width=0.3, head_length=0.005,
                        color=colors[dim], ec=colors[dim])
            death = max_filtration_value + eps
        segs.append([[birth, y], [death, y]])
        if v > 1:
            ax_st.annotate(f"{v}", (death, y + 0.2))
        counter += 1

    segs = np.array(segs, dtype=np.float64)
    if len(segs):
        line_segments = LineCollection(segs, linewidths=2,
                                       colors=colors[dim],
                                       label=labels_st[dim],
                                       linestyle="dashed")
        ax_st.add_collection(line_segments)

    counter += 1

ax_st.axvline(x=max_edge_length, color="gray", alpha=0.3)
ax_st.text(max_edge_length, y - 0.085, rf"thresh = {max_edge_length}", rotation=90, fontdict={"fontsize": 15})

ax_st.tick_params(axis="x", labelsize=18)

ax_st.autoscale()
ax_st.get_yaxis().set_visible(False)
ax_st.legend(loc="lower left", fontsize=18)
# ax_st.margins(y=1)
ax_st.set_title("Steenrod barcode", fontdict={"fontsize": 22}, pad=15)

# plt.savefig("cyclo-octane_subsampled_absolute_barcodes.pdf")
plt.show()